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Abstract 

Recombination has the potential to facilitate adaptation. In spite of the substantial body of theory on the impact of 
recombination on the evolutionary dynamics of adapting populations, empirical evidence to test these theories is still 
scarce. We examined the effect of recombination on adaptation on a large-scale empirical fitness landscape in HIV-1 based 
on in vitro fitness measurements. Our results indicate that recombination substantially increases the rate of adaptation 
under a wide range of parameter values for population size, mutation rate and recombination rate. The accelerating effect 
of recombination is stronger for intermediate mutation rates but increases in a monotonic way with the recombination rates 
and population sizes that we examined. We also found that both fitness effects of individual mutations and epistatic fitness 
interactions cause recombination to accelerate adaptation. The estimated epistasis in the adapting populations is 
significantly negative. Our results highlight the importance of recombination in the evolution of HIV-1. 
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Introduction 

Recombination, here broadly defined as the shuffling of genetic 
material, is widespread in nature and occurs among a wide range 
of taxa, including most eukaryotes but also bacteria and viruses. It 
has long been believed that sex and recombination facilitate 
adaptation by increasing the genetic variance upon which natural 
selection can act [1]. However, recombination can also reduce 
variation if there is a preponderance of co-adapted allelic 
associations. This cost, referred to as recombination load, arises 
because recombination tends to unravel combinations of genes 
that are favored by selection, thus impeding adaptation [2-4]. 
Several hypotheses have been developed to account for the 
conditions under which recombination accelerates adaptation [4- 
8], Epistasis-based hypotheses state that if beneficial mutations 
increase fitness less in combination than expected based on their 
individual effects (negative epistasis), recombination can accelerate 
adaptation by increasing genetic variance and thus enhancing the 
efficacy of selection [6,7,9]. In contrast, recombination is predicted 
to decelerate adaptation with positive as well as sign epistasis (i.e., 
when the direction of selection on an allele depends on the allelic 
status at other loci) [10-12]. According to another class of 
hypotheses, random genetic drift resulting from finite population 
size is believed to provide an advantage to recombination [4,13- 
15]. This occurs because in finite populations, beneficial mutations 



are likely to occur on different backgrounds and compete with 
each other, thus reducing selection efficacy. Recombination 
alleviates this competition by bringing beneficial mutations 
together on the same background and consequently speeds up 
adaptation (the Fisher-Muller effect) [14—16]. 

Prior experimental studies have demonstrated that sex and 
recombination can facilitate adaptation [17-21]. Yet, our under- 
standing of the costs and benefits of recombination during 
adaptation on empirical complex fitness landscapes is still limited. 
The structure of the underlying fitness landscape is a decisive 
factor for the effect of recombination on adaptation since it 
determines how natural selection creates non-random combina- 
tions of alleles. Although realistic fitness landscapes are believed to 
exhibit a complex structure characterized by intricate patterns of 
fitness interactions among genes, not much is known about the 
structure of large-scale fitness landscapes. This lack of knowledge 
hampered efforts to obtain insights into the effect of recombination 
on adaptation in realistic situations. To our knowledge, only one 
study explores the effect of recombination on an empirical fitness 
landscape [22], but this fitness landscape comprises only six loci, 
and thus, the results may not be generalizable to larger landscapes. 

Recently, Hinkley et al. [23] have estimated a fitness landscape 
in HIV-1 for 1859 mutations, based on an in-vitro assay for viral 
replicative capacity. This empirical fitness landscape, by far the 
largest available empirical database characterizing epistatic 
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Author Summary 

One of the most challenging issues in evolutionary biology 
concerns the question of why most organisms exchange 
genetic material with each other, e.g. during sexual 
reproduction. Gene shuffling can create genetic diversity 
that facilitates adaptation to new environments, but 
theory shows that this effect is highly dependent on 
how different genes interact in determining the fitness of 
an organism. Using a large data set of fitness values based 
on HIV-1, we provide evidence that shuffling of genetic 
material indeed raises the level of genetic diversity, and as 
a result accelerates adaptation. Our results also propose 
genetic shuffling as a mechanism utilized by HIV to 
accelerate the evolution of multi-drug-resistant strains. 

interactions, allows us for the first time to scrutinize the impact of 
recombination on the evolutionary dynamics of an adapting 
population on a realistic fitness landscape. Through simulations on 
this fitness landscape, we examined the effect of recombination on 
adaptation under different conditions. We found benefits of 
recombination are sufficiently high to accelerate adaptation under 
a wide range of parameters. Our findings highlight the important 
evolutionary role of recombination in adaptation, and in 
particular, in HIV evolution. 

Results 

Recombination was found to produce a substantial increase in 
the rate of adaptation (Figure 1A). This effect was robust with 
respect to the initial composition of the population, as it is 
observed not only when starting from the reference sequence but 
also when initializing the population with a random sequence 
(Figure SI and Section SI in Text SI). The acceleration of 
adaptation by recombination can be attributed to increased 
genetic variance in fitness, which in turn enhances the efficacy of 
natural selection, as proposed by the fundamental theorem of 
natural selection [14]. Indeed, we see a markedly stronger increase 
in genetic variance in fitness over time in the recombining, 
compared to the non-recombining population (Figure IB). These 
results on population fitness are also in agreement with patterns of 
genetic diversity within the evolving populations: the recombining 
populations accumulate within-population diversity faster than the 
non-recombining populations (Figure S2A and Section S2 in Text 
SI), and they diverge faster from the initial sequence (Figure S2B 
and Section S2 in Text SI). 

We also observed that with increasing recombination rate the 
fittest genotypes that emerged by the end of each simulation 
became more divergent from each other across replicate runs 
(Figure S3A and Section S3 in Text SI). However, it appears that 
this diversifying effect of recombination is primarily a consequence 
of the fact that recombination increases the rate at which the 
populations adapt and traverse the fitness landscape: when 
sequences are compared when the population reaches a certain 
threshold mean fitness, recombination has little effect on 
divergence (Figure S3B and Section S3 in Text SI). In other 
words, recombination accelerates adaptation but does not increase 
the number of evolutionary trajectories available to the popula- 
tions in the course of adaptation. 

We also considered how different drug treatments affect the 
impact of recombination on adaptation. To this end, we measured 
the effect of recombination using fitness landscapes obtained for 16 
environments with different drug treatments. Our results indicate 
that the effect of recombination is markedly stronger in the 



presence of antiviral drugs compared to the drug-free environment 
(Figure S4 and Section S4 in Text SI), which appears to be due to 
the higher selection pressure in environments with drug treatments 
(results not shown). 

We next explored how the population size and mutation rate 
affect the extent to which recombination accelerates adaptation. 
Figure 2A indicates that, for a given population size, adaptation is 
already accelerated with moderate recombination rates and this 
effect increases monotonically with increasing recombination 
rates. In contrast, the effect of recombination depends non- 
monotonically on mutation rate (Figure 2B): whereas modest 
mutation rates enhance the accelerating effect of recombination on 
adaptation, at very high mutation rates this effect is reduced (see 
Discussion). Finally, our results indicate that with increasing 
population size, the accelerating effect of recombination becomes 
stronger (Figure 2A and S5 and Section S5 in Text SI). This is 
because with increasing population size, more beneficial mutations 
co-segregate in the population on different backgrounds and as a 
result the Fisher-Muller effect is enhanced. We expect that in very 
large populations this acceleration would become weaker again 
(because then all combinations of beneficial alleles would be 
present in the population even without recombination), but in the 
range of population sizes that was computationally feasible this 
was not observed. We also found the effect of recombination for 
populations with the same population mutation rate (population 
size xmutation rate) to be dependent on the population size: the 
effect is maximized for intermediate values of the population 
mutation rate and this maximum occurs at higher values for larger 
populations (Figure S6 and Section S6 in Text SI). This is 
presumably because for larger populations a higher mutation rate 
is required for multiple beneficial mutations to occur on the same 
background and thereby mitigate the Fisher-Muller effect. 

Our fitness model incorporates both main and pairwise epistatic 
effects, both of which are known to influence the effect of 
recombination on the rate of adaptation. To assess the relative 
contribution of these effects, we simulated adaptation on fitness 
landscapes where, starting from the original M HL fitness matrix, 
we decreased the epistatic and main effects by varying amounts. 
Figure 3 indicates that both main effects and epistatic effects 
contribute to the acceleration of adaptation and that in 
combination these two effects appear to operate additively. Similar 
results were obtained for smaller recombination rates (Figure S7). 
We also found that both main effects and epistatic interactions 
increase the rate of adaptation and the effect of recombination 
becomes stronger with increasing adaptation rate (results not 
shown). These findings suggest that both main and epistatic effects 
can enhance selection. This seems to result in stronger interfer- 
ences between arisen beneficial mutations and therefore a higher 
advantage of recombination (see Discussion). 

We next determined the predominant form of epistasis in the 
adapting populations. To infer epistasis, one can use the fitness 
values of a set of sequences that are chosen irrespective of the 
composition of the evolving population and that therefore may not 
represent the sequences formed during adaptation ('physiological 
epistasis'). Alternatively, only fitness values of sequences that are 
present in the adapting population can be utilized to estimate 
epistasis. This form of epistasis, referred to as population epistasis, 
provides a real time estimate of the epistasis that is responsible for 
generating the standing linkage disequilibrium in the population, 
and is therefore more accurate than physiological epistasis (see 
Discussion). We calculated population epistasis by regressing log 
fitness W against Hamming distance, i.e., the number of sites where 
the corresponding sites are different between two sequences. The 
regression was done for n sequences that are present in the 
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Figure 1. A) Population fitness and B) fitness variance in the recombining and non-recombining populations over time. Red and blue 
curves correspond to the mean value across 100 simulations in the recombining and non-recombining settings, respectively. Fitness and fitness 
variance values are normalized with the fitness of the reference sequence. The shaded regions show 95% confidence intervals. Parameters take 
values: 7V = 10 5 , i^IO" 1 and m=10 -3 . 
doi:1 0.1 371 /journal.pgen.1 004439.g001 



population at the end of simulation according to w(ri) = an + [In 2 , 
where Hamming distance n is measured relative to the reference 
sequence. The parameter /?, determining the curvature, is used as 
a measure of epistasis. Our results indicate that in the majority of 
simulations, population epistasis is significant (ANOVA test for 
comparison of a quadratic and a linear model, p« 0.001 for the 
simulations in Figure 4) and predominantly negative, indicating 
diminishing returns with each additional beneficial mutation in 
increasing fitness. Population epistasis in the recombining popu- 
lation becomes less negative on average than in the non- 
recombining population (Wilcoxon test for the significance of 
epistasis in Figure 4, p« 0.001 and for the significance of the 
difference between recombining and non-recombining simulations 
Figure 4B, p« 0.001). We obtained similar results for populations 
at other time points (results not shown). 

Thus far, we considered the effect of recombination on 
adaptation by comparing evolving populations characterized by 
different recombination rates (including the absence of recombi- 
nation). To examine whether recombination is selectively favored 
within an adapting population, we performed additional simula- 
tions in which we competed a resident non-recombining with an 
invading recombining genotype during adaptation. Figure S8A 



shows that the frequency of the recombining genotype gradually 
increases over time if the recombination rate is high enough 
(r>0.01). In accordance with our previous findings, the results of 
the invasion analysis demonstrate that the benefit of recombina- 
tion is most pronounced for intermediate mutation rates (Figure 
S8B and S8C and Section S7 in Text SI). 

Discussion 

Our results can be interpreted as support for the proposed 
accelerating role of recombination in the adaptive process through 
the Fisher-Muller effect. In our simulations, this effect seems to be 
sufficiently strong to outweigh potential costs of recombination. 
The Fisher-Muller effect is based on strong selective interference 
between beneficial mutations in an asexual (non-recombining) 
population. Previous mathematical models have provided impor- 
tant insights into how the strength of selection, mutation rate and 
population size affect selective interference and the Fisher-Muller 
effect, but these models have ignored epistatic interactions 
between mutations [24—30]. Our results demonstrate that both 
epistatic interactions and main fitness effects contribute to the 
accelerating effect of recombination. It is clear that this effect in 
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Recombination rate Mutation rate 

Figure 2. Effect of model parameters on the impact of recombination on adaptation. Each bar corresponds to the relative fitness of the 
recombining to the non-recombining population at generation 100, measured by taking the logarithm (base 10) of the ratio of mean fitness of the 
recombining population to that of the non-recombining population and averaged across 100 simulations. The blue, green, orange and red bars 
correspond to population sizes 10 4 , 5 x 10 4 , 10 5 and 5 x 10 5 , respectively. The error bars show the standard deviation of the difference between two 

log normal distributions {Jtf 0 g W ~ j + n \ ai (w vl )■ A) The effect of recombination rate at different population sizes for mutation rate: u= 10~ 3 . B) 

The effect of mutation rate at different population sizes for recombination rate: r = 10 -1 , (The effect of recombination is still significant for 
populations larger than 10 3 with this recombination rate; see Figure S5). 
doi:1 0.1 371 /journal.pgen.1 004439.g002 
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Figure 3. Contribution of the main and epistatic effects to the 
recombination effect on adaptation on the HIV-1 fitness 
landscape. The x and y axes show the values of the constants by 
which the elements of the main and epistatic effects, respectively, are 
multiplied in the hierarchical fitness landscape. The plot shows the 
logarithm (base 10) of the ratio of mean fitness of the recombining to 
that of the non-recombining populations at generation 100, averaged 
across 100 simulations. Parameters take values: A^IO 5 , r=10 _1 and 

K=1(T 3 . 

doi:1 0.1 371 /journal.pgen.1 004439.g003 

the absence of epistatic interactions is due to the Fisher-Muller 
effect. Adding epistatic interactions to this model enhances the 
effect of recombination but the interpretation of why this occurs is 
challenging. On the one hand, this may be because the epistatic 
interactions increase the overall strength of selection and thereby 
produce stronger clonal interference, but on the other hand we 
cannot exclude explanations based purely on epistasis (see below). 

One important determinant of the Fisher-Muller effect is the 
mutation rate. With a finite number of loci, an increasing mutation 
rate leads to a higher number of co-segregating beneficial 



mutations and this augments the Fisher-Muller effect. However, 
at very high mutation rates, it becomes increasingly likely that 
several beneficial mutations arise on the same genetic background 
so that recombination becomes less important (Figure 2B). A 
similar effect is expected for population size (with finite sites for 
beneficial mutations) but for the range of population sizes we 
examined here the effect of population size was monotonic. 

In addition to the Fisher-Muller effect, recombination can also 
accelerate adaptation in the absence of random genetic drift when 
there is epistasis [9,31]. Several studies have attempted to 
determine the prevailing form of epistasis in nature but have 
yielded inconsistent results: sometimes strong positive epistasis 
[32-36], negative epistasis [37,38] or pervasive sign epistasis [39- 
42] was reported. The HIV-I fitness landscape is characterized by 
extensive epistatic interactions [23]. We demonstrated that 
population epistasis during adaptation is predominately negative. 
This result is in apparent contrast with the predominant positive 
epistasis in HIV-I sequences reported in Bonhoeffer et al. [33]. We 
think this discrepancy arises mainly because Bonhoeffer et al. 
estimated physiological epistasis, which is based only on the 
structure of fitness landscape itself and may be very different from 
population epistasis that we estimated here [43,44] . The difference 
between the two estimates is that in determining population 
epistasis only the mutations that pass the sieve of natural selection 
are taken into account, whereas in measuring physiological 
epistasis all mutations are used indiscriminately. In addition to 
these different measures of epistasis, the two studies also differ in 
the way fitness was estimated. First, Bonhoeffer et al. [33] obtained 
fitness values from a much smaller data set than was used to 
estimate the fitness landscape in our study (9466 vs. 70,081 
sequences). Second, in their study the main effects of a mutation 
and the epistatic effects for a given pair of mutations were 
calculated by averaging over the fitness effects of other mutations 
in the genetic background. By contrast, we obtained fitness using a 
predictive fitness model [23] that explicitly accounts for mutational 
effects in different genetic backgrounds during estimation of the 
fitness landscape, and therefore provides a more accurate estimate. 

It is tempting to interpret the significant negative population 
epistasis that we observed as a support for the mechanism 
proposed by the mutational deterministic hypothesis, i.e. acceler- 




Hamming Distance 

Figure 4. Population epistasis of the HIV-1 fitness landscape. A) An example of the log fitness plot against the Hamming distance between 
the reference sequence and sequences at generation 100 of one simulation for 20000 data points. Each point corresponds to one fitness value. To 
improve the visualization of the data on x axis, random values taken from a normal distribution with mean 0 and standard deviation 0.1 are added to 
the Hamming distances. The curvature of the red curved fitted to the data points defines population epistasis, which in this example is —0.0405. B) 
Distribution of population epistasis values across 500 simulations in the absence (blue) and presence (red) of recombination measured at generation 
100. Parameters take values: jV = 10 5 , r = 1 0 1 and !/=10~ 3 . 
doi:1 0.1 371 /journal.pgen.1 004439.g004 
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ation of adaptation through reduction of negative linkage 
disequilibria generated by negative epistasis. However, we would 
like to caution that it is very difficult to explain how population 
epistasis arises in our model and how it impacts the effect of 
recombination. For example, population epistasis on a complex 
fitness landscape can also be generated by variation in main fitness 
effects of mutations. The underlying causes of negative epistasis in 
our model and the extent to which it contributes to the 
accelerating effect of recombination (in isolation from the Fisher- 
Muller effect) is difficult to determine on a complex fitness 
landscape because deterministic models are not feasible. There- 
fore, we cannot exclude the possibility that stochastic benefits of 
recombination due to the Fisher-Muller effect may be sufficiendy 
strong to override any direct effect of epistasis in our simulations, 
as reported by other studies [1 1,45,46]. 

Our study is based on an estimated fitness landscape and 
therefore the limitations of this approach should be taken into 
account while interpreting the results. First, in the fitness landscape 
that we used only main and pairwise epistatic effects were 
estimated but higher order fitness interactions (>2) were 
neglected. To accurately estimate higher orders epistasis, a much 
larger number of sequences with measured fitness values would be 
required. It is not clear what the strength of these higher-order 
interactions is and how they affect the impact of recombination. 
Second, although the accuracy of our fitness model (predicting 
54.8% of the variance; see Methods) is acceptable as the only 
available fitness landscape, the predicted fitness landscape is yet to 
become more realistic by using a greater number of empirical fitness 
data. This is important as the estimated fitness values become 
increasingly unreliable for the regions in the sequence space far 
from the reference sequence due to the lack of data. To account for 
this problem, we focused on the population dynamics in the region 
of the fitness landscapes that is close to the reference sequence. 
Finally, the empirical fitness data used to predict the structure of the 
fitness landscape was obtained from an in vitro assay, and therefore 
might not completely correspond to in vivo fitness values. 

It should also be noted that this study mainly examines the effect 
of recombination at the population level and does not address the 
evolution of recombination rate. One interesting extension of this 
work would be to incorporate variation in recombination rates in 
the model and study the spread of a recombination modifier gene 
in a non-recombining population. One problem with using the 
modifier approach in a realistic way with the current fitness model 
is that probably the best candidate for a recombination modifier 
gene in the HIV-1 genome is the reverse transcriptase gene [see 
47], which is itself part of the fitness landscape and therefore 
changes during adaptation because of direct selection. 

Our study relates to the debate over the advantage of 
recombination in retroviral, and in particular HIV, evolution. 
Recombination is believed by some to be beneficial because it 
generates genetic diversity to facilitate the development of 
multidrug resistance [48—50] or escape from host immune reaction 
[51]. Nonetheless, some studies have suggested that recombination 
has not evolved to facilitate adaptation but is a mere by-product of 
other mechanisms such as genomic organization [reviewed in 52, 
see also 45]. Unlike some prior studies [53-55], our model does 
not include any specific feature of HIV biology, such as viral 
dynamics during infection or specificities of recombination in 
HIV. Nonetheless, we believe that our findings are generic enough 
to highlight the potential role of recombination in accelerating 
HIV evolution. 

This study utilizes data derived from a high-throughput fitness 
assay to address one of the long-standing questions in evolutionary 
biology. The advent of systems biology approaches made it 



possible to obtain a comprehensive picture of a large-scale fitness 
landscape. This serves as a framework for us to demonstrate that 
recombination has a substantial accelerating effect on adaptation 
on a realistic complex fitness landscape. 

Methods 

Fitness Model 

Our model is based on a recently estimated fitness landscape of 
HIV-1 [23]. Briefly, to obtain this fitness landscape, the in vitro 
replicative capacity of 70,081 samples from HIV-1 subtype B 
infected individuals were measured and the corresponding amino 
acid sequences of the protease and partial sequences of the reverse 
transcriptase were obtained for all of these samples. This enabled 
estimation of the fitness effects of 1,857 single mutations and 
257,536 pairs of mutations in these samples by fitting a fitness 
model to the data. This fitness model, as detailed in Hinkley et al. 
[23], invokes a generalized kernel ridge regression (GKRR) 
method to estimate the fitness effect of individual amino acid 
variants and the epistatic effects between variants. 

Based on these results, we used the following fitness model to 
obtain fitness values for a given sequence: 

\og(W x ) = x T Mx 

Here, the amino acid sequence x is a binary vector indicating 
the presence or absence of amino acid variants. M is a triangular 
matrix where an entry M;; on the diagonal determines the main 
effects of the amino acid variant i and the off-diagonal entries M ; j 
(with i>j) determine pairwise epistatic effects between variant i 
and j. 

Higher order epistatic interactions were not considered. Note 
that the original model in Hinkley et al. [23] also includes an 
intercept term that gives the log fitness of a reference sequence 
(NL4-3) and that is added to Equation (1). However, since natural 
selection only depends on relative fitness in our model, this term 
was not considered in our simulation setting. 

We used two different types of matrices M determining fitness. 
The first matrix type, M RL , describing the 'reference fitness 
landscape', was obtained by Hinkley et al. [23] by estimating both 
main and epistatic effects simultaneously. This estimation was 
done in 16 different environments: one drug-free environment and 
15 environments each characterized by the presence of a different 
antiretroviral drug. On average, this matrix predicts 54.8% of the 
variance in fitness across different environments. Unless stated 
otherwise, we use the fitness landscape in the drug-free environ- 
ment as the reference fitness landscape in our simulations. 

To obtain the second matrix type, M , describing a 
'hierarchical fitness landscape', two fitting steps were performed 
[56]. In the first step, M HL was estimated by assuming that there 
are only main effects (all off-diagonal elements set to zero), and in 
the second step, epistatic effects were estimated by fitting the 
residuals under the assumption that main effects are absent. This 
fitness landscape was obtained only for the drug free environment. 
Since the main and epistatic effects are estimated separately for 
these fitness landscapes, this approach allowed us to generate 
fitness landscapes where we could scale the magnitudes of main 
and epistatic effects and thus evaluate their relative contribution 
with respect to the effects of recombination. Hierarchical fitness 
landscapes with different magnitudes of epistatic effects were 
shown to provide accurate predictions of the reference fitness 
landscape [56]. 
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For details about the estimation procedures, we refer to Hinkley 
et al. [23] and Kouyos et al. [56]. 

Simulation Framework 

We consider a discrete-time model based on the classic the 
Wright-Fisher model to simulate adaptive evolution under 
mutation, recombination and natural selection on the HIV-1 
fitness landscape. The population consists of a constant number of 
TV amino acid sequences, each of which contains the protease, as 
well as a partial reverse transcriptase, gene. Initially, this 
population is monomorphic, consisting only of N copies of a 
reference sequence (NL4-3). 

In each generation, the new population is formed from the 
previous one through three steps. First, reproduction and selection 
are implemented through random sampling of N sequences, 
weighted according to the relative fitness value of each sequence. 
Second, to implement mutation events, Nu sequences are 
randomly chosen from the population with replacement (u denotes 
the per genome mutation rate), thus allowing for several mutations 
per sequence. For Nu<\, this number is treated as a random 
number with mean Nu. For each of these sequences, the allele at a 
randomly selected site for which there exists more than one 
possible variant is substituted with one of the other possible allelic 
variant. Amino acid variants at a given site that are not present in 
the data set used to estimate fitness are neglected. In the final step, 
selected sequences undergo homologous recombination. We 
denote the recombination rate by r. Here, Nr pairs of sequences 
are chosen randomly (without replacement) and for each of these 
pairs, a single crossover site at which the two parental sequences 
exchange genetic material is determined at random. Note that 
recombination may result in daughter sequences that are identical 
to the parental sequences if identical pieces are exchanged. 

The simulations were run for 100 generations. This period is 
long enough for the population to adapt but the adapting 
population still remains in the proximity of the reference sequence, 
where due to the availability of empirical fitness data, the 
estimation of the fitness by our model is reliable. To examine 
the effect of recombination on adaptation, we computed the ratio 
of the logarithm (base 10) of the population mean fitness of a 
population evolving with recombination to that of a population 
evolving without recombination. In this case, finding a proper 
definition of error bars is not straightforward since the data in 
question are the logarithms of ratios. However, this logarithm 
can be written as a difference Qog(W w / rec / W w i orec ) = 
\og\W w / re A ~\og(W w / 0 rec )), so that we can use the standard 
deviation of the difference between two log normal distributions, 

^/°fog( ~w i ) + °fog( w i )> as error bars. This is justified because 

we found the log fitness values of sequences at generation 100 
across 100 simulations to be normally distributed (for instance, 
non-significant Shapiro- Wilk and Anderson-Darling test to reject 
normal distribution for results in Figure 1, with p>0.05). 

Supporting Information 

Figure SI Relative population fitness with vs. without recom- 
bination (as measured in Fig. 2) across 100 simulations over 100 
generations, with 50 random initial populations and three 
mutation rates: A) w=10~ 4 B) «=10~ 3 and C) « = 5xl0~ 3 . 
The curves show the mean value and the shaded region shows the 
95% confidence interval. Other parameters take the values: 
iV=10 5 , r=l0-\ 
(TIFF) 



Figure S2 Effect of recombination on A) sequence variation and 
B) sequence divergence over time. A) Box plot of sequence 
variation over time in the recombining (red) and non-recombining 
(blue) populations as measured by the mean value of pairwise 
Hamming distances between sequences of a sample of 100 
sequences in 100 simulations every 10 generations. The boxes give 
the interquartile range, the whiskers indicate the boundary of 1 .5 
times the interquartile range, and the points beyond that are 
outliers. B) Box plot of divergence of sequences from the 
initial population over time in the recombining (red) and 
non-recombining (blue) populations as measured by the mean 
values of the Hamming distances between a sample of 1000 
sequences with the reference sequence for 100 simulations every 
10 generations. The boxes, outliers and whiskers are defined the 
same as in panel A. Parameters values: JV=10 5 , r=\0~ l and 
M =10-3. 
(TIFF) 

Figure S3 Hamming distance between the fittest sequences 
formed at A) generation 100 and B) generation at which the 
population mean fitness exceeds 3.16 times the fitness of the 
reference sequence across 100 simulations at different recombi- 
nation rates. The outliers and whiskers are defined as in Figure S5. 
Parameters take values N = 10 5 and u= 10~ 3 . 
(TIFF) 

Figure S4 Effect of recombination on adaptation on fitness 
landscapes across different environments characterized by the 
absence of drugs or by the presence of a single antiretroviral drug. 
Color indicates drug-class (red: no drug; cyan: non-nucleoside 
reverse transcriptase inhibitor; blue: nucleoside analog reverse 
transcriptase inhibitor; green: protease inhibitor). Each bar shows 
the logarithm value (base 10) of the ratio of the average of 
population fitness values across 100 simulations in the recombin- 
ing population to that in the non-recombining population at 
generation 100. The error bars are defined as in Figure 2. 
Parameter values: N = 10 5 , r= 10~' and u= 10~ 3 . 
(TIFF) 

Figure S5 Effect of mutation rate on the impact of recombina- 
tion on adaptation for smaller populations. Each bar shows the 
logarithm value (base 10) of the ratio of the average of population 
fitness values across 100 simulations in the recombining popula- 
tion to that in the non-recombining population at generation 100. 
The blue, green, yellow, orange and red bars correspond to 
population sizes 4x 10 2 , 10 3 , 4x 10 3 , 6x 10 3 and 10 4 , respec- 
tively. The error bars are defined as in Figure 2. Recombination 
rate: r= 10 -1 . 
(TIFF) 

Figure S6 Effect of mutation rate on the impact of recombi- 
nation on adaptation for population with the same population 
mutation rates (population size (N) x mutation rate («)). Each 
bar shows the logarithm value (base 10) of the ratio of the 
average of population fitness values across 100 simulations in the 
recombining population to that in the non-recombining 
population at generation 100. The blue, green and orange bars 
correspond to population sizes 10 3 , 10 4 and 10 5 , respectively. 
The error bars are defined as in Figure 2. Recombination rate: 
r=10 _1 . 
(TIFF) 

Figure S7 Contribution of the main and epistatic effects to the 
recombination effect on adaptation on the HIV- 1 fitness landscape 
at two recombination rates A) r= 10~ 3 and B) r= 10~ 2 . The x 
and y axes show the values of the constants by which the elements 
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of the main and epistatic effects, respectively, are multiplied in the 
hierarchical fitness landscape. The plot shows the logarithm (base 
10) of the ratio of the mean fitness of the recombining to that of the 
non-recombining populations at generation 100, averaged across 
100 simulations. Parameters take the values: TV = 10 s and 

M =io- 3 . 

(TIFF) 

Figure S8 Invasion of the non-recombining population by the 
recombining type. The initial frequency of the recombining type is 
1% of total the population size. The initial population is 
monomorphic, consisting only of the reference sequence. A) 
Frequency of the recombining type that invades a non-recombin- 
ing population during adaptation, averaged across 500 simula- 
tions. The shaded region shows the 95% confidence interval. 
Parameters values: N= 10 5 , r= 1 and u= 1(T 3 B), C), D) and E) 
Mean frequencies of the recombining population at generation 
1000 across 500 simulations for different recombination rates A) 
r= \ B) r = 0.5 C) r = 0.1 and D) r = 0.01. Each bar shows the 
mean final frequencies of the recombining population across 500 
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